Using Multilingual Topic Models for Improved Alignment in English-Hindi MT
نویسندگان
چکیده
Parallel corpora are often injected with bilingual dictionaries for improved Indian language machine translation (MT). In absence of such dictionaries, a coarse dictionary may be required. This paper demonstrates the use of a multilingual topic model for creating coarse dictionaries for English-Hindi MT. We compare our approaches with: (a) a baseline with no additional dictionary injection, and (b) a corpus with a good quality dictionary. Our results show that the existing Cartesian product approach which is used to create the pseudo-parallel data results in a degradation on tourism and health datasets, for English-Hindi MT. Our paper points to the fact that existing Cartesian approach using multilingual topics (devised for European languages) may be detrimental for Indian
منابع مشابه
That'll Do Fine!: A Coarse Lexical Resource for English-Hindi MT, Using Polylingual Topic Models
Parallel corpora are often injected with bilingual lexical resources for improved Indian language machine translation (MT). In absence of such lexical resources, multilingual topic models have been used to create coarse lexical resources in the past, using a Cartesian product approach. Our results show that for morphologically rich languages like Hindi, the Cartesian product approach is detrime...
متن کاملSupporting Large English-Hindi Parallel Corpus using Word Alignment
This paper gives description about methodology to understand parallel English-Hindi sentences using word alignment. This methodology is foundation to develop the parallel EnglishHindi word dictionary after syntactically and semantically analysis of the English-Hindi source text. Methodology of proposed system is used for the English and Hindi sentences; also the methodology can be used for othe...
متن کاملEvaluating Machine Translation Evaluation’s BLEU Metric for English to Hindi Language Machine Translation
Machine Translation Evaluation (MTE) has been widely recognized by the Machine Translation (MT) community. The main objective of MT is to break the language barrier in a multilingual nation like India. Evaluation of MT is required for Indian languages because the same MT is not works in Indian language as in European languages due to the language structure. So, there is a great need to develop ...
متن کاملVision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech
In this paper, we explore the learning of neural network embeddings for natural images and speech waveforms describing the content of those images. These embeddings are learned directly from the waveforms without the use of linguistic transcriptions or conventional speech recognition technology. While prior work has investigated this setting in the monolingual case using English speech data, th...
متن کاملMicrosoft Word - 19. OK_Revised [RegDone-3-4_305]_Mapping Parallel English _11-03_ CR-S-R
In this paper, we present a methodology for one to one (1:1) mapping of parallel English-Hindi parallel sentences. This methodology is based on the development of parallel English-Hindi word dictionary after syntactically and semantically analysis of the English-Hindi source text. We are using this methodology for the English and Hindi sentences, but the methodology can also be used for other l...
متن کامل